Statistical Inference III

Chelsea Parlett-Pelleriti

Frequentist Confidence Intervals

For a frequentist, a confidence interval is an interval estimate that includes a a given % of our estimated sampling distribution (e.g. 90%, 95%, 89%)

  • Point Estimate: A single value estimate of a parameter (e.g., sample mean \(\bar{x}\)).
  • Interval Estimate: A range of values within which the parameter is expected to lie with a certain level of confidence.

Frequentist Confidence Intervals

The general form of a confidence interval is:

\[ \text{Point Estimate} \pm \text{Margin of Error} \]

Interpretation of Confidence Intervals

Remember: for Frequentists, the parameters \(\theta\) are fixed, and samples of data \(X\) are random.

Frequentists always imagine what would happen if we took infinitely many random samples of size \(n\).

Interpretation of Confidence Intervals

Remember: for Frequentists, the parameters \(\theta\) are fixed, and samples of data \(X\) are random.

Frequentists always imagine what would happen if we took infinitely many random samples of size \(n\).

If we used these theoretical samples to calculate a CI, we’d expect that the long-run proportion of these intervals that contain the true population parameter \(\theta\) to be \(1-\alpha\) (the confidence level, e.g. 90%).

Interpretation of Confidence Intervals

Interpretation of Confidence Intervals

The confidence we have is in the procedure we use to generate the CIs: if the assumptions we’re making are true, \(\alpha\) of the CIs generated this way will contain \(\theta\).

We don’t know if our confidence interval is one of those \(1-\alpha\) CIs, but if we act as if it does contain \(\theta\), in the long-run we’ll be wrong only \(1-\alpha\) of the time.

Interpretation of Confidence Intervals

  • \((lower, upper)\) are all guesses for \(\theta\) that are reasonable based on the data we’ve seen.

  • \((lower, upper)\) are values of \(\theta\) that are compatible with the data we’ve seen.

  • We are \(1-\alpha \times 100\)% confident that \(\theta\) is between \((lower, upper)\)

Note on Frequentistism

You’ll see the phrase “act as if” a lot when I talk about Frequentist statistics. Frequentist methods allow you to define procedures that control for the long-run error rate.

If we act as if our CI contains \(\theta\), we only expect to be wrong \(1-\alpha\) of the time. We choose \(\alpha\)

Constructing Confidence Intervals

A confidence interval depends on:

  • The sampling distribution of the estimator

    • \(\sigma\): The variability in the data
    • \(n\): The number of data points
  • \(1-\alpha\): The desired confidence level

Constructing Confidence Intervals

Shiny App

Confidence Interval for the Mean (Known Variance)

Confidence Interval for the Mean (Known Variance)

When the population variance \(\sigma^2\) is known and the sample size is large (\(n \geq 30\)), the confidence interval for the mean \(\mu\) is:

\[ \underbrace{\bar{x}}_\text{Point Est} \pm \underbrace{z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)}_\text{Margin of Error} \]

Where:

  • \(\bar{x}\): Sample mean
  • \(z_{\alpha/2}\): Critical value from the standard normal distribution
  • \(n\): Sample size

Confidence Interval for the Mean (Known Variance)

\[ \underbrace{\bar{x}}_\text{Point Est} \pm \underbrace{z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)}_\text{Margin of Error} \]

\(\bar{x}\) is the center of our sampling distribution, so it is also the center of our CI.

Confidence Interval for the Mean (Known Variance)

\[ \underbrace{\bar{x}}_\text{Point Est} \pm \underbrace{z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)}_\text{Margin of Error} \]

\(\frac{\sigma}{\sqrt{n}}\) , the standard error, is the standard deviation of the sampling distribution.

\(z_{\alpha/2}\) is the z-scores of the \(\frac{\alpha}{2}\) quantile (e.g. when \(\alpha = 0.95\) we look for the z-scores at the \(0.025\) and \(0.975\) quantiles

\(z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)\) tells us how far away the upper and lower bounds are from the point estimate

Confidence Interval for the Mean (Known Variance)

\[ \alpha = 0.95; z_{\alpha/2} = 1.96 \]

Confidence Interval for the Mean (Unknown Variance)

When the population variance is unknown and the sample size is small, we use the t-distribution:

\[ \bar{x} \pm t_{\alpha/2, n-1} \left( \frac{s}{\sqrt{n}} \right) \]

Where:

  • \(s\): Sample standard deviation
  • \(t_{\alpha/2, n-1}\): Critical value from the t-distribution with \(n-1\) degrees of freedom

Confidence Interval for the Mean (Unknown Variance)

Remember: \(t\) distributions have heavy-tails. When we’re using an estimate for the standard deviation, so we’re more uncertain about our estimates because there are two sources of uncertainty:

  • uncertainty about the sample mean \(\bar{x}\)

  • uncertainty about the sample standard deviation \(s\)

    the \(t\) distribution better represents this added uncertainty

Confidence Interval for the Mean (Unknown Variance)

Confidence Interval for the Mean (Unknown Variance)

Example: Confidence Interval for a Mean

Suppose we have the following data representing the weights (in kg) of a sample of 20 gorillas that get an extra 🍌 each day.

weights <- c(70, 72, 68, 65, 74, 69, 71, 73, 67, 66, 75, 68, 70, 72, 69, 71, 74, 68, 66, 73)
n <- length(weights)
sample_mean <- mean(weights)
sample_sd <- sd(weights)
# known_sd <- 3
upper <- sample_mean + qt(0.975,
                          df = length(weights)-1)
lower <- sample_mean + qt(0.025,
                          df = length(weights)-1)
print(paste0("95% CI: (", round(lower,2), " , ", round(upper,2), ")"))
[1] "95% CI: (67.96 , 72.14)"

Other gorillas in the zoo have a mean weight of 65 kg. Is there any evidence that gorillas that get an extra 🍌 have a higher mean weight? Discuss.

Evaluating Confidence Intervals

A good confidence interval should:

  • have good coverage

  • be precise

Evaluating Confidence Intervals

Coverage

  • Nominal Coverage: \(1-\alpha\)

  • Actual Coverage: \(P(lb \leq \theta \leq ub)\)

Coverage should be at least \(1-\alpha\). We might want to test coverage under a range of situations (e.g. small sample size, skewed distributions..etc).

Many statistical properties are asymptotic, and we are never at the asymptote.

Evaluating Confidence Intervals

Precision

Many intervals where \(P(lb \leq \theta \leq ub) = 1-\alpha\)

Choose the narrowest one.

Evaluating Confidence Intervals

Precision

❓ Why would a narrower CI (assuming a constant \(1-\alpha\) confidence level) be useful?

Bootstrapped Confidence Intervals

Bootstrapping is a resampling technique that approximates a sampling distribution by sampling from a sample with replacement.

By sampling with replacement, we’re treating the sample as an approximate population, and sampling from it.

What is Bootstrapping?

  • Sample with replacement from the original data
  • Generates “new” samples (bootstrap samples) of the same size as the original dataset.
  • Calculates the statistic (e.g., mean, median) on each bootstrap sample.
ages <- c(18,27,27,19,25,23,26,23,20,23,
          19,17,21,19,18,18,21,18,20,25,
          23,17,23,18,23,25,23,25,19,27)

boot_sampling_dist <- replicate(n = 1000, # 1000 boot samples
                      expr = mean(sample(ages, # sample from ages
                      size = length(ages), # same sample size
                      replace = TRUE))) # w replacement

What is Bootstrapping?

What is Bootstrapping?

# 90% CI
round(quantile(boot_sampling_dist, probs = c(0.05, 0.95)),2)
   5%   95% 
20.70 22.67 
# 50% CI
round(quantile(boot_sampling_dist, probs = c(0.25, 0.75)),2)
  25%   75% 
21.26 22.07 

Why Use Bootstrapping?

  • No assumptions about the distribution of the data
  • Flexibility in estimating various statistics

Why Use Bootstrapping?

\[ \text{ROI} = \frac{\text{conversions}}{\text{spend}}; \text{CPA} = \frac{\text{spend}}{\text{conversions}} \]

rois <- c(0.59,0.96,0.3,0.61,0.07,0.17,0.19,0.25,0.13,0.74,0.66,0.24,0.11,0.93,0.04,0.58,0.05,0.57,0.62,0.1)

boot_sampling_dist <- replicate(n = 1000, # 1000 boot samples
                                expr = mean(sample(rois, # sample from rois
                                                   size = length(rois), # same sample size
                                                   replace = TRUE))) # w replacement

But what if I want a 75% CI for CPA?

Why Use Bootstrapping?

boot_sampling_dist_cpa <- 1/boot_sampling_dist

# 75% CI
round(quantile(boot_sampling_dist_cpa, probs = c(0.125, 0.875)),2)
12.5% 87.5% 
 2.12  3.10 

Why NOT Use Bootstrapping?

  • Computationally Expensive: Requires a large number of resamples.

  • Sample Quality Dependent: Quality of results depends on the quality of original sample.

  • Edge Cases: May not perform well with very small sample sizes or extreme distributions.

Bootstrapping in R

library(boot)
# fake data
set.seed(123)
data <- rnorm(50, mean = 5, sd = 2)

# statistic
mean_sq_function <- function(data, indices) {
  sample_data <- data[indices]
  return(mean(sample_data)**2)
}

# bootstrap
bootstrap_results <- boot(data, mean_sq_function, R = 1000)

# ci
ci <- boot.ci(bootstrap_results, type = "basic")
print(ci)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = bootstrap_results, type = "basic")

Intervals : 
Level      Basic         
95%   (20.39, 30.47 )  
Calculations and Intervals on Original Scale

Using Confidence Intervals

❓ when choosing a confidence level, how would you decide? Are there tradeoffs of choosing a high vs. low confidence level?

Confidence Interval Overlap

We have a depression score that ranges from \(-3 \to 3\). Based on clinical research, people’s QOL is noticeably changed if we can change their depression score by \(0.25\) in either direction.

After doing corgi therapy, the 95% CI for the change in people’s depression scores is:

\[ (-0.05, 0.35) \]

❓if any change \(\pm 0.25\) is clinically irrelevant, what does this CI tell us about corgi therapy?

Read more here, but warning: 2GPV is not a widely accepted practice at the moment, however it shares a lot of ideas and properties with Equivalence Testing which we’ll discuss later.